<html>
<body style="padding:20px;width:650px;font-family:Times">
<title>Columns</title>
<h1>Columns/Pairs Columns</h1>
Use this interface to select columns for the results tables from Sequence and Pair queries. Note that
sections are collapsible; to expand a section and see the filter options,
click on the "+". There are descriptions of each beside the column name in each
of these interfaces, so this Help provides any additional information.

<h2>Columns</h2>
 Both "Show All" and "Filters" use these column setting for the
 respective table. If the table exists, use "Refresh Columns" to update the columns.
<h3>General</h3>
General sequence attributes: ID, length, total reads, etc. The columns available
depend on the attributes of the database, e.g. if it is an assembled set, it
will have the column'#buried'.

<h3>Sequence Sets (assembled transcript sets only)</h3>
This is a little confusing, so some detail is necessary. The following
types of databases may be built, where this column type will only be 
available on the 3rd type:
<ol>
<li>This dataset is from one (already assembled) transcript set with read counts (e.g. Illumina), 
where Library Read Counts are the imported read counts.
<li>This dataset is from an TCW assembled read library(s) (e.g. Sanger or 454), 
where the Library Read Counts are the 
number of reads assembled into a contig.
<li>The assembly included at least one transcript set and may have included 
one or more read libraries.
In this case, the Library Read Counts for the transcripts are the imported ones, 
<br><i>but
how is one to know if multiple sequences from a transcript set are in the contig?</i>
This is what the "Sequence Sets" is for -- it is the number of sequences in a
contig from each transcript set.
</ol>

<h3>Library</h3>
There will be a column for each library in each transcript set plus each read library.


<h3>N Fold</h3>
Select pairs of libraries to show their fold-change difference as a column. 
The fold change is calculated using RPKM values, with values FC &lt; 0 written 
as -1/FC. Zeros in the denominator are replaced by 0.001. Click where it says "Select"
to see all libraries, or click one of the arrows to step through the libraries.

<h3>Differential Expression P-value</h3>
If your project has differential expression results added using runDE, then you can 
select those columns here. Note that newer databases encode up/down regulation into
the p-value sign, i.e. up-regulated p-values are positive, down are negative. 
The results table sorts appropriately. (If your database is older you can update the
values by re-running runDE.)


<h3>R Statistic</h3>
The R Statistic<sup>1</sup> is a quick way to look for differential expression among multiple libraries at once.
It can be performed either on all libraries, or only the "Included" libraries specified in Filter Query. 

<h3>Counts of DB Hits</h3>
Annotation-related count filters.

<h3>Best Eval/Best Annotation</h3>
Best Annotation: The hit list is sorted by e-value and the Best Anno is assigned to the first with that either (1) SwissProt hit, or (2) has a good annotation, which means it does not have a phrase like "uncharacterized protein". A SwissProt hit takes priority over a good annotation if the e-value exponent is within 20% of the first good annotation's e-value, e.g at least 1E-16 if the good annotation is 1E-20. 
<p>Best Eval: It will almost always be the hit with the best e-value, though if it finds one that has a better annotation with an e-value exponent within 5% of the best, it chooses that.

<h3>SNPs and ORFs</h3>

Note that SNPs are only available if the  assembly was performed within TCW. 

<P>ORFs is the best Open Reading Frame. In order to accommadate de novo transcripts which
may have frameshifts or incorrect bases, the emphasize of the algorithm 
is on finding the longest region with
no STOP codons versus finding an ATG-STOP region (unless it is >=600bp). Precedence is
given to the frame that corresponds to the protein hit frame. Only ATG is used for the start codon.
Note, these differences mean that the ORF may be different from what is found on the NCGI ORF Finder site.

<h3>Rounding Numbers</h3>
Various options for rounding, including significant figures and decimal place choices. Note that all calculations
are performed with un-rounded values, and then the result is rounded. To see the exact values stored in the database,
and used in calculations, select "No rounding".

<h2>Pairs Columns (Pairs Filter)</h2>

The description of each column is explained on the column page. 

<p>
<hr>
<p>

<sup>1</sup>Stekel, D.J., Git, Y. and Falciani, F. (2000) 
<br>
The comparison of gene expression from multiple cDNA libraries. Genome Res, 10, 2055-2061.
<p>
Note that the R-statistic is a Poisson-based test not accounting for overdispersion, hence
can produce false-positives on RNA-seq data, especially at low fold change. Results should 
be verified by a more rigorous method (e.g., EdgeR, DESeq), but note that the latter methods
work only on pairwise library comparisons, and require replicates to work accurately.  

